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Abstract 

Given  a  set  of  n  objects,  each  characterized  by  d  attributes  specified  at  m  fixed  time 
instances,  we  are  interested  in  the  problem  of  designing  space  efficient  indexing  structures 
such  that  arbitrary  temporal  range  search  queries  can  be  handled  efficiently.  When  m  =  1, 
our  problem  reduces  to  the  d-dimensional  orthogonal  search  problem.  We  establish  efficient 
data  structures  to  handle  several  classes  of  the  general  problem.  Our  results  include  a  linear 
size  data  structure  that  enables  a  query  time  of  O (log  n  log  m/  log  log  n  +  /)  for  one-sided 
queries  when  d  =  1,  where  /  is  the  number  of  objects  satisfying  the  query.  A  similar  result 
is  shown  for  counting  queries.  We  also  show  that  the  most  general  problem  can  be  solved 
with  a  polylogarithmic  query  time  using  nonlinear  space  data  structures. 


1  Introduction 

In  this  paper,  we  introduce  a  framework  for  exploring  temporal  patterns  of  a  set  of  objects 
and  discuss  the  design  of  indexing  structures  for  handling  temporal  orthogonal  range  queries 
in  such  a  framework.  We  assume  that  each  object  is  characterized  by  a  set  of  attributes, 
whose  values  are  given  for  a  sequence  of  time  snapshots.  The  temporal  patterns  of  interest 
can  be  defined  as  the  values  of  certain  attributes  remaining  within  certain  bounds,  changing 
according  to  a  given  pattern  (say  increasing  or  decreasing),  or  satisfying  certain  statistical 
distributions.  We  focus  here  on  temporal  patterns  characterized  by  orthogonal  range  values 
over  the  attributes.  More  specifically,  we  are  aiming  to  design  indexing  structures  to  quickly 
fold  objects  whose  attributes  fall  within  a  set  of  ranges  during  a  given  time  period  specified 
at  query  time.  In  the  dynamic  case,  either  objects  or  time  snapshots  can  be  added  or  deleted. 
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Computational  Infrastructure  (NPACI),  DoD-MD  Procurement  under  contract  MDA90402C0428,  and  NASA 
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Our  framework  is  very  general  and  encompasses  problems  in  multidimensional  range  search 
and  temporal  range  search  for  data  time  series. 

More  formally,  let  S'  be  a  set  of  n  objects  {Oi,  O 2,  •  •  •  ,  On },  each  of  which  is  characterized 
by  a  set  of  d  attributes  whose  values  change  over  time.  We  are  given  m  snapshots  of  each 
object  at  time  instances  WO,  •  •  •  ,  W-  The  set  of  values  of  the  d  attributes  of  object  Oi  at 
time  instance  tj  is  denoted  as  a  vector  v(7,  j)  =  [vj(l),vj(2), .  .  .  }v1i(d)]. 

We  are  interested  in  developing  a  data  structure  for  S  so  that  the  following  types  of 
queries,  called  temporal  range  queries ,  can  be  handled  very  quickly: 

Given  two  vectors  a  =  [«i ,  a2,  •  •  •  ,  aj]  and  b  =  [0,  b2}  ■  ■  •  ,  0],  and  two  time 
instances  ts  and  te.  Find  the  set  Q  of  objects  such  that  for  every  Oi  £  Q, 
ak  A  vj ( k )  <  bk  for  all  1  <  k  <  d  and  ts  <  tj  <  te. 

Note  that  the  general  multidimensional  orthogonal  range  search  is  a  special  case  of  our 
problem  corresponding  to  a  single  time  snapshot.  Typically,  we  measure  the  complexity  in 
terms  of  the  storage  cost  of  the  data  structure  and  the  query  time  as  functions  of  n,m,  and 
d,  where  typically  d  is  considered  to  be  a  constant. 

Many  applications  fall  in  a  natural  way  under  our  general  framework.  The  following  is  a 
list  of  a  few  such  examples. 

•  Climatologists  are  often  interested  in  studying  the  climate  change  patterns  for  certain 
geographical  areas,  each  characterized  by  a  set  of  environmental  variables  such  as 
temperature,  precipitation,  humidity,  etc.  Given  a  time  series  of  such  information  for 
n  regions,  one  would  like  to  quickly  explore  relationships  among  such  regions  by  asking 
queries  of  the  following  type:  determine  the  regions  where  the  annual  precipitation  is 
above  40  inches  and  the  summer  temperature  is  above  70°  F  between  the  years  1965 
and  1975. 

•  In  the  stock  market,  each  stock  can  be  characterized  by  its  daily  opening  price,  closing 
price,  and  trading  volume.  Related  interesting  queries  that  fall  under  our  framework 
are  of  the  following  type:  determine  the  stocks,  each  of  whose  daily  opening  price  is 
less  than  $2  and  whose  daily  trading  volume  is  larger  than  200  million  shares  during 
the  year  2000. 

•  As  an  application  related  to  data  warehousing,  consider  a  retail  chain  that  has  stores 
across  the  country,  each  of  which  reports  their  sales  on  a  daily  basis.  A  typical  query 
will  for  example  be  to  identify  the  stores  whose  sales  exceeded  $100,000  for  each  of  the 
past  12  months. 

•  Consider  a  set  of  n  cities,  each  characterized  by  annual  demographic  and  health  data, 
for  a  period  of  30  years.  In  exploring  patterns  among  these  cities,  one  may  be  interested 
in  asking  queries  about  the  number  of  cities  that  had  a  high  cancer  rate  and  a  high 
ozone  level  between  1990  and  2000. 

The  d-dimensional  orthogonal  range  search  problem,  which  is  a  special  case  of  our  prob¬ 
lem,  has  been  studied  extensively  in  the  literature.  The  best  results  do  achieve  linear  space 
and  polylogarithmic  query  time  for  three-sided  reporting  queries  and  four-sided  counting 
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queries  for  d  =  2  [13,  3],  and  for  dominance  reporting  queries  for  d  =  3.  Otherwise,  all 
fast  query  time  algorithms  require  nonlinear  space,  sometimes  coupled  with  matching  lower 
bounds  under  certain  computational  models[2,  5,  4],  Note  that  we  can’t  treat  our  problem  as 
an  orthogonal  range  search  problem  by  simply  treating  the  time  snapshots  as  just  an  extra 
time  dimension  to  the  d  dimensions  corresponding  to  the  attributes.  This  is  the  case  since 
the  values  of  an  object’s  attributes  at  different  time  instances  cannot  be  treated  simply  as 
independent  of  each  other.  However  we  can  combine  all  the  attribute  values  of  an  object 
together  to  specify  that  object,  resulting  in  a  (md)-dimensional  range  search  problem,  which 
is  clearly  quite  undesirable,  especially  for  large  to. 

Another  related  class  of  problems  that  have  been  studied  in  the  literature,  especially  the 
database  literature,  deals  with  a  time  series  of  data  by  appending  a  time  stamp  to  each  piece 
of  data  separately.  Hence  such  an  approach  will  be  quite  inefficient  to  capture  temporal 
information  about  single  objects  since  it  will  have  to  process  the  values  at  all  the  time 
steps  between  ts  and  te.  Examples  of  such  techniques  include  those  based  on  persistent  data 
structures  [6],  such  as  the  Multiversion  B-tree  [10]  and  the  Multiversion  Access  Methods  [20], 
and  the  Overlapping  B+-trees  [12]  and  its  extensions  such  as  the  Historical  R-tree  [14],  the 
HR+-tree  [17],  and  the  Overlapping  Linear  Quadtrees  [18,  19].  To  answer  a  query  that 
involves  a  time  period,  the  query  time  of  these  methods  will  often  depend  on  the  length  of 
the  time  period,  which  is  undesirable  for  our  general  problem  since  the  temporal  range  query 
could  cover  a  very  long  time  period  characterized  by  the  two  parameters  ts  and  te. 

Another  related  topic  involves  the  so-called  kinetic  data  structures ,  which  are  used  for 
indexing  moving  objects.  Queries  similar  to  ours  involving  both  time  periods  and  positions 
of  objects  have  been  studied,  for  example  in  the  work  of  Ararwal  et  al.  [1]  and  Saltenis  et. 
al  [15].  However,  the  objects  are  considered  there  to  be  points  moving  along  a  straight  line 
and  at  a  consistent  speed.  As  a  result,  the  positions  of  the  objects  need  not  be  explicitly 
stored.  In  our  case,  such  a  problem  will  be  formulated  as  the  positions  of  each  object  at 
different  time  instances  (that  are  the  same  for  all  the  objects),  without  any  assumption  about 
expected  trajectories  or  speeds. 

Before  stating  our  main  results,  let  us  introduce  two  main  variations  of  temporal  range 
queries,  which  are  similar  to  those  appearing  in  orthogonal  range  search  queries.  The  report¬ 
ing  query  requires  that  a  list  of  the  objects  (or  their  indices)  be  generated  as  an  answer  to 
the  query,  while  the  counting  query  requires  only  that  only  the  number  of  objects  satisfying 
the  query  be  generated.  Our  results  include  the  following: 

•  A  linear  space  data  structure  that  handles  temporal  range  queries  for  a  single  object 
in  0(1)  time,  assuming  the  number  d  of  attributes  is  constant. 

•  Two  data  structures  that  handle  temporal  one-sided  range  reporting  queries  for  a  set 
of  objects  in  0(logTOlogn  +  /)  1}  and  O (log  to  log  n/  log  log  n  +  /)  time  respectively, 
the  first  using  0(nm )  space,  and  the  second  using  0(mn  loge  n),  where  /  is  the  number 
of  objects  satisfying  the  query,  and  d  =  1. 

•  Two  data  structures  that  use  O(nm\og(nm))  and  O(nmlog1+e(nm))  space  respectively 
to  answer  the  temporal  one-sided  range  counting  queries.  The  first  data  structure 

Hn  this  paper,  we  always  assume  the  logarithmic  operations  to  be  of  base  2. 
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enables  O(log2(nm))  query  time  and  the  second  enables  O((log(nm)/ log  log(nm))2) 
time,  under  the  assumption  that  d  =  1. 

•  By  a  reduction  to  the  2d-dimensional  dominance  problem,  the  most  general  problem 
can  be  solved  in  polylogarithmic  query  time  using  O(nm2polylog(n))  space.  When  to 
is  extremely  large,  we  show  that  it  is  possible  to  use  o(nm 2)  space  to  achieve  polylog¬ 
arithmic  query  time. 

Before  proceeding,  we  notice  that  the  actual  time  instances  {G,  t2}  ■  ■  •  ,  tm}  can  be  replaced 
by  their  subscripts  {1,2,  •  •  •  ,  to}.  By  doing  so,  we  introduce  the  additional  complexity  of 
having  to  convert  ts  and  te  specified  by  the  query  to  C  and  l2  respectively,  where  tq  is 
the  first  time  instance  no  earlier  than  ts  and  ti2  is  the  last  time  instance  no  later  than  te. 
This  conversion  can  be  done  in  O(log  to)  time  and  0(m)  space  using  binary  search  or  an 
asymptotically  faster  O(log  to/  log  log  to)  algorithm  with  a  larger  constant  behind  the  big-0 
notation  and  the  same  0(m)  space  using  the  fusion  tree  of  Fredman  and  Willard  [7].  In 
the  remaining  of  this  paper,  we  assume  that  the  time  instances  are  represented  by  integers 
{1,  2,  •  •  •  ,  to}  and  the  time  interval  in  the  query  is  represented  by  two  integers  C  and  /2.  For 
brevity,  we  will  use  the  \i..j]  to  denote  the  set  of  integers  { z ,  z  +  1 ,  •••,}}.  Without  causing 
confusion,  we  will  call  the  set  of  contiguous  integers  \i..j]  a  time  period. 

The  remainder  of  the  paper  is  organized  as  follows.  The  next  section  discusses  a  special 
version  of  the  temporal  range  search  problem,  which  involves  only  a  single  object.  The  data 
structure  for  the  reporting  case  of  temporal  one-sided  range  queries  is  covered  in  Section  3, 
while  the  counting  version  is  covered  in  Section  4.  In  Section  5,  we  deal  with  the  two-sided 
temporal  range  query,  and  conclude  in  Section  6. 


2  Preliminaries:  Handling  Range  Queries  of  a  Single 
Object 

Consider  the  case  of  temporal  range  queries  involving  only  a  single  object  0.  We  provide 
a  simple  solution  to  this  case,  which  will  be  used  to  handle  the  more  general  case.  Let  the 
values  of  the  attributes  of  0  at  time  instance  j  be  [ffi( l),ffi(2),  •  •  •  }v3(d)].  Given  two  real 
vectors  a  =  [«i ,  a2,  •  •  •  ,  a;]  and  b  =  [G,  b2}  ■  ■  •  ,  &/],  and  two  time  instances  C  and  /2,  we  will 
describe  an  efficient  method  to  test  whether  the  following  predicate  holds: 

P:  For  every  time  instances  j  that  satisfies  C  <  j  <  /2,  cik  <  v3(k)  <  b k  for  all 
k  between  1  and  d. 

Since  we  are  assuming  that  d  is  a  fixed  constant,  we  can  restrict  ourselves  to  the  following 
case.  Let  the  object  0  be  specified  by  [u1,  u2,  •  •  •  ,  um],  where  each  C  is  a  real  number. 
We  develop  a  data  structure  that  can  be  used  to  test  the  following  predicate  for  any  given 
parameters  C,  /2,  and  a: 

P'\  For  every  time  instances  j  satisfying  C  <  j  <  /2,  v3  >  a. 

We  start  by  making  the  following  straightforward  observation. 
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Observation  1  A  predicate  of  type  P'  is  true  if  and  only  if  min{ v3  \  j  £  [/i../2]}  A  a- 

Using  this  observation,  our  problem  is  reduced  to  finding  the  minimum  value  v3  of  the  object 
during  the  time  period  [/i../2]  and  comparing  it  against  the  value  of  a. 

The  problem  of  hireling  the  minimum  value  in  time  period  [/i../2]  can  be  reduced  to  the 
problem  of  hireling  the  nearest  common  ancestor  in  the  so  called  Cartesian  tree ,  as  described 
in  [8]. 

A  Cartesian  tree  [21]  for  a  sequence  of  m  real  numbers  is  a  binary  tree  with  m  nodes.  In 
our  case,  a  Cartesian  tree  for  time  instances  [/..?’]  with  l  <  r  has  r  —  l  +  1  nodes.  The  root 
stores  the  smallest  value  v3  over  the  time  period  [/..?’].  If  there  are  multiple  vJ's  with  the 
smallest  value,  the  earliest  one  is  chosen  to  be  stored  at  the  root.  The  left  subtree  of  the 
root  is  the  Cartesian  tree  for  time  instances  [/..(?’  —  1)]  and  the  right  subtree  is  the  Cartesian 
tree  for  the  time  instances  [(?’  +  l)..r].  The  left  (resp.  right)  tree  is  null  if  i  =  l  (resp.  i  =  ?’). 
The  tree  nodes  are  labeled  l  through  r  according  to  the  in-order  traversal  of  the  tree  (which 
correspond  to  their  time  instances).  Figure  1  gives  an  example  of  the  Cartesian  tree. 


Figure  1:  A  Cartesian  treq  for  the  sequence  [8,  4,  6,  3,  5, 1,  7,  8].  The  number  outside  each 
node  represents  the  time  instance  of  the  attribute  value  stored  at  the  node. 

It  is  easy  to  realize  that  the  smallest  value  among  jV, .  .  . ,  v3}  is  the  one  stored  in  the 
nearest  common  ancestor  of  nodes  i  and  j.  This  problem  was  addressed  in  [9],  where  the 
following  result  is  shown. 

Lemma  1  Given  a  collection  of  rooted  trees  with  n  vertices,  the  nearest  common  ancestor 
of  any  two  vertices  can  be  found  in  0(1)  time,  provided  that  pointers  to  these  two  vertices 
are  given  as  input.  This  algorithm  uses  0(n)  space. 

It  is  easy  to  see  if  a  tree  is  complete,  we  can  easily  solve  the  nearest  common  ancestor 
problem  in  linear  space  and  constant  time  by  labeling  the  tree  nodes  in  the  order  of  the  in- 
order  traversal  and  performing  bit-operations  on  the  labels  corresponding  to  the  two  vertices. 
Harel  and  Tarjan  solve  the  the  same  problem  for  any  arbitrary  tree  by  first  transforming 
it  into  a  compressed  tree  of  logarithmic  depth,  augmenting  the  subtrees  of  it  into  complete 
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trees  without  asymptotically  increasing  the  overall  storage  cost,  and  applying  the  technique 
for  complete  trees.  For  details  see  [9]. 

Given  the  above  lemma,  we  immediately  have  the  following  result. 

Theorem  1  Predicate  P'  can  be  evaluated  using  0(1)  time  with  0(m)  space  data  structure. 

If  we  build  a  Cartesian  tree  where  an  internal  node  stores  the  maximum  instead  of  the 
minimum  value,  we  can  evaluate  predicates  involving  upper  bounds  instead  of  lower  bounds. 
We  call  the  former  Cartesian  tree  a  minimum  Cartesian  tree  and  the  latter  a  maximum 
Cartesian  tree.  By  building  both  the  minimum  and  the  maximum  Cartesian  trees  for  each 
of  the  d  attributes,  we  will  be  able  to  evaluate  the  general  P  predicates  in  linear  space  and 
constant  time,  which  is  optimal. 

Corollary  1  A  P  predicate  can  be  evaluated  using  0(1)  time  with  0(m)  space  data  structure. 

3  Handling  One-Sided  Queries  for  an  Arbitrary  Num¬ 
ber  of  Objects 

In  this  section,  we  deal  with  temporal  range  queries  for  n  objects  with  only  one  attribute, 
that  is  d  =  1.  Let  vj  denote  the  value  of  object  0;  at  time  instance  j .  We  want  to  preprocess 
the  data  and  construct  a  linear  size  data  structure  so  that  queries  of  the  following  type  can 
be  answered  quickly: 

Q p  Given  a  tuple  (/i,/2,a),  with  h  <  /2,  report  all  objects  whose  attributes  are 
greater  than  or  equal  to  a  for  all  time  instances  between  h  and  /2. 

We  call  such  queries  temporal  one-sided  reporting  queries. 

Observation  1  is  again  very  important  in  answering  queries  of  type  Q i.  A  straightforward 
approach  to  solve  our  problem  would  be  to  determine  for  each  possible  time  interval  the  set 
of  minimal  values,  one  for  each  object,  and  store  the  minima  corresponding  to  each  time 
interval  in  a  sorted  list.  A  query  can  then  be  immediately  handled  using  the  sorted  list 
corresponding  to  the  time  interval  [h,  l2\.  However,  the  storage  cost  would  then  be  0(nm2)} 
which  is  quite  high  especially  in  the  case  when  to  is  much  larger  then  n.  We  will  develop  an 
alternative  strategy  that  requires  only  linear  space. 

Assume  that  we  have  built  a  Cartesian  tree  C)  for  object  0{.  Then,  each  attribute  vj  of 
this  object  can  be  associated  with  a  sequence  of  contiguous  time  instances  during  which  vj  is 
the  smallest.  We  call  this  sequence  the  dominant  interval  of  vj.  In  fact,  the  dominant  interval 
corresponds  to  the  set  of  nodes  in  the  subtree  rooted  at  the  node  j  in  C{.  For  example,  the 
value  vf  of  the  object  i  whose  corresponding  Cartesian  tree  is  shown  in  Figure  1  is  associated 
with  time  interval  [1,5].  Let  [sj..ej]  be  the  dominant  interval  of  attribute  vj. 

Consider  the  set  of  nm  tuples  (vj}  sj,  ej,i,j).  One  way  for  answering  a  Q1  query  would  be 
to  identify  those  5-tuples  that  satisfy  [sj..ej]  D  [/i../2]  and  vj  >  a.  However  an  object  can  be 
reported  many  times,  which  defeats  our  goal  of  achieving  a  query  time  of  0(logc(nTO)  +  /), 
where  c  is  a  small  constant  and  /  is  the  number  of  objects  satisfying  the  query.  Consider 
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for  example  the  object  given  in  Figure  1.  A  query  with  h  =  2,  /2  =  3,  and  a  =  0  would 
report  it  three  times,  for  the  5-tuples  that  correspond  to  time  instances  2,  4,  and  6.  In  fact, 
an  object  can  be  reported  to  times  in  the  worst  case. 

This  problem  is  taking  care  of  in  the  next  lemma. 

Lemma  2  An  object  Oi  should  be  reported  if  and  only  if  there  exist  a  unique  5-tuple 
(vj,  sj,  ej,  i,  j)  such  that  the  following  conditions  are  true:  [sj..ej]  D  [li-.lf;  j  £  [h--lf\;  and 
vj  >  a. 


Proof: 

Suppose  an  object  Oi  needs  to  be  reported.  This  means  its  values  during  the  time  period 
[/i../2]  are  no  smaller  than  a.  Let  vj  =  min{uj|/i  <  l  <  /2}.  It  is  obvious  that  the  5- 
tuple  (vj,sj,ej,i,j)  satisfies  the  three  conditions  in  Lemma  2.  On  the  other  hand,  it  is 
straightforward  to  see  that  the  existence  of  such  a  5-tuple  ensures  the  presence  of  object 
Oi  in  the  answer  to  the  query.  The  uniqueness  of  the  5-tuple  (vj,  sj,  ej,  i,  j)  is  guaranteed 
by  the  definition  of  dominant  intervals  \sj..ej\.  Indeed,  suppose  we  have  another  5-tuples 
(vj  ,  sj  ,  ej  ,i,jr)  that  satisfies  [sj  ..ej  ]  D  [ts..te],  j1  £  [ts..te],  and  vj  >  a.  By  definition,  both 
j  and  j1  are  the  smallest  values  during  the  time  interval  [/i../2].  Without  lose  of  generality, 
assume  j  <  j1,  then  sj  >  j,  which  is  in  contradiction  to  the  condition  that  sj  <  h  <  j.  □ 

Lemma  2  reduces  the  problem  of  determining  the  objects  satisfying  the  query  to  finding 
a  5-tuple  for  each  such  object,  which  satisfies  the  three  stated  conditions.  To  solve  the  latter 
problem,  we  first  single  out  those  attributes  that  were  taken  during  the  time  period  [/i,/2] 
and  then  filter  them  using  the  remaining  two  conditions. 

We  first  construct  a  balanced  binary  tree  T  based  on  the  to  time  instances.  The  jth 
leaf  node  starting  from  the  left  corresponds  to  time  instance  j.  Each  node  v  of  this  tree  is 
associated  with  a  set  S(v)  of  n  tuples,  one  from  each  object.  If  v  is  the  jth  leaf  node,  then 
S(v)  =  {(vj,sj,ej,i,j)\i  =  1  ,...,n}.  If  v  is  an  internal  node  with  two  children  u  and  w 
and  the  5-tuples  of  object  Oi  in  S(u)  and  S(w)  are  (vj1 ,  sj1 ,  ej1 ,  i,  ji)  and  (vj2 ,  sj2 ,  ej2 ,  i,  j2) 
respectively,  then  the  5-tuple  of  object  Oi  in  S(v)  is  (vj ,  sj ,  ej,  i,  j),  where  j  is  either  j i  or  j2, 
depending  on  whether  [sj1..ej1]  D  [sj2..ej2]  or  [sj2 . . ej2]  D  [sj1..ej1].  (The  reason  why  one  and 
only  one  of  the  above  conditions  must  be  true  should  be  easy  to  understand  by  recalling  the 
definition  of  dominant  intervals.)  To  give  an  example,  let  us  consider  the  case  where  n  =  2 
and  to  =  8.  The  values  of  the  attributes  of  the  two  objects  and  the  corresponding  5-tuples 
are  given  in  Table  1.  Figure  2  gives  the  corresponding  tree  structure. 

Given  a  Q i  query  (h,  /2,  a),  we  can  easily  find  the  set  of  at  most  log  to  allocation  nodes  in 
T,  using  the  interval  [h,  l2\.  An  allocation  node  is  a  node  whose  corresponding  time  interval 
is  fully  contained  in  [/i,/2]  and  that  of  whose  parent  is  not.  If  the  query  time  interval  is 
[2.. 6],  for  the  example  given  in  Figure  2,  then  the  allocation  nodes  are  b,  k,  and  1.  For  each 
allocation  node  v,  we  know  that  all  the  n  samples  in  S(v)  are  taken  during  the  time  period 
[/i,/2].  Therefore,  if  a  5-tuple  (vj ,  sj ,  ej  ,i,  j)  £  S(v)  satisfies  [sj  -  -  ej]  D  [h,/2]  and  vj  >  a, 

then  Oi  should  be  reported.  Otherwise,  object  Oi  should  not  be  reported.  In  either  case, 

no  further  search  on  As  descendants  is  needed.  This  is  true  because  of  the  following.  First, 
if  Oi  is  reported  at  node  v,  then  there  is  no  need  to  look  for  Oi  any  more.  Second,  if  Oi 

is  not  reported  at  v,  this  means  either  [sj . . ej]  [/i../2]  or  vj  <  a.  If  the  former  is  true, 

then  no  tuple  of  Oi  stored  in  the  descendants  of  v  can  cover  [/i../2]  because  [sj  -  -  ej]  covers 
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j 

object  Oi 

object  0-2 

value 

5-tuple 

value 

5-tuple 

1 

8 

(8, 1,1, 1,1) 

5 

(5, 1,1, 2,1) 

2 

4 

(4, 1,3, 1,2) 

2 

(2, 1,3, 2, 2) 

3 

6 

(6, 3, 3, 1,3) 

4 

(4, 3, 3, 2, 3) 

4 

3 

(3, 1,5, 1,4) 

1 

(1,1, 8, 2, 4) 

5 

5 

(5, 5, 5, 1,5) 

7 

(7, 5, 5, 2, 5) 

6 

1 

(1,1, 8, 1,6) 

3 

(3, 5, 8, 2, 6) 

7 

7 

(7, 7, 7, 1,7) 

6 

(6, 7, 8, 2, 7) 

8 

2 

(2, 7, 8, 1,8) 

8 

(8, 8, 8, 2, 8) 

Table  1:  The  values  of  two  objects 


(8, 1,1,1, 1)  (4,1, 3,1,2)  (6,3,3, 1,3)  (3, 1,5, 1,4)  (5,5, 5, 1,5)  (1,1,8, 1,6)  (7,7,74,7)  (2,7,84,8) 

(5,1,1,24)  (24,3,2,2)  (4,3, 3, 2,3)  (14,8,2,4)  (7,5, 5, 2, 5)  (3, 5, 8, 2, 6)  (6, 7, 8,2,7)  (8, 8, 8,2,8) 


Figure  2:  The  tree  structure  corresponding  to  the  data  given  in  Table  1.  Each  node  contains 
two  5-tuples,  one  from  each  object. 


the  dominant  intervals  of  all  the  other  values  of  0 ;  stored  in  the  subtree  rooted  at  v.  If  the 
latter  is  true,  then  we  are  sure  Ot  should  not  be  reported  at  all. 

One  final  note  is  that,  even  though  an  object  is  represented  multiples  times  in  the  form 
of  its  tuples,  it  will  be  reported  at  most  once.  This  can  be  justified  as  follows.  If  an  object  is 
reported,  then  only  one  of  its  m  tuples  satisfies  the  conditions  derived  from  the  query.  Note 
that  even  though  a  tuple  may  be  stored  in  up  to  log  m  nodes,  these  nodes  form  a  partial  path 
from  the  root  to  a  leaf  node  and,  as  a  result,  only  the  node  at  the  highest  level  corresponding 
to  [/i,/2]  will  be  considered. 

For  each  node  v,  looking  for  5-tuples  (i £  S(v)  that  satisfy  [sj..ej]  D  [/i,/2] 
and  vj  >  a  is  equivalent  to  a  three-dimensional  dominance  reporting  problem,  which  can 
be  solved  in  O(log??  +  /((’))  time  using  the  data  structure  of  Makris  and  Tsakalidis  [11], 
which  we  call  the  dominance  tree.  Here  f{v)  is  the  number  of  objects  reported  when  node 
v  is  visited.  Notqthat  there  are  2m  —  1  nodes  in  the  tree  and  each  node  is  associated  with 
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a  dominance  tree  of  size  0(n).  The  overall  size  of  the  data  structure  is  0{nm).  A  query 
process  involves  identifying  the  O(log  to)  allocation  nodes  in  O(log  to)  time  and  searching  the 
dominance  trees  associated  with  these  allocation  nodes.  Hence  O(log  n  +  f(v))  time  is  spent 
at  each  such  node  v.  Therefore,  the  complexity  of  the  overall  algorithm  is  O(log  n  log  to  +  /), 
where  /  is  total  number  of  objects  reported. 

In  [16],  we  provide  a  faster  algorithm  for  solving  the  the  three-dominance  query  problem. 
The  algorithm  uses  O(n\oge  n)  space  and  O (log  n/  log  log  n  +  /)  query  time,  where  e  is  an 
arbitrarily  small  positive  constant.  Using  this  data  structure  instead  of  the  dominance  tree, 
we  can  further  reduce  the  query  complexity  to  0 (log  to  log  n/  log  log  n  +  /)  at  the  expense 
of  increasing  the  storage  cost  to  0{mn\ogf  n).  We  thus  have  the  following  theorem. 

Theorem  2  Given  n  objects,  each  specified  by  the  values  of  its  attribute  at  m  time  in¬ 
stances,  we  can  build  an  indexing  structure  so  that  any  one-sided  reporting  query  can  be 
answered  in  O(log  n  log  to  +  /)  time  and  0(nm )  space,  or  O(log  to  log  n/  log  log  n  +  /)  time 
and  O (mn  loge  n)  space,  where  f  is  the  number  of  objects  satisfying  the  query  and  e  is  an 
arbitrarily  small  positive  constant. 

We  next  consider  the  counting  query  counterpart. 


4  Handling  One-Sided  Counting  Queries 

In  this  section,  we  consider  the  following  temporal  range  counting  queries. 

Qp-  Given  a  tuple  (h,  /2,  a),  with  <  /2,  determine  the  number  of  objects  whose 
values  are  greater  than  or  equal  to  a  for  all  time  instances  between  and 

h- 

The  conditions  stated  in  Lemma  2  (Section  3)  can  be  expressed  as  sj  <  l\  <  j ,  j  <  /2  <  ej, 
and  vj  >  a;  and  there  is  at  most  one  such  instance.  Hence  the  answer  to  the  query  is 
|A(/i,/2,a)|,  where  A(/i,/2,a)  =  {(z,  j)\s\  <h<  j,j  <h<  e-,  and  vj  >  a}. 

Let 


U (/i,  /2,  a)  =  {{i,j)\vf  >  a}, 

B\(h,  /2,  a)  =  {(i,j)\l2  <  j  and  vj  >  a}, 

H2(/i,/2,a)  =  {{i,j)\h  >  ej  and  vj  >  a}, 

Bfih,  /2,  a)  =  <  sl  and  vj  >  a}, 

Bfih,  /2,  a)  =  {{i,j)\h  >  j  and  vj  >  a}, 

Ci(h,  /2,  a)  =  {{i,j)\h  <  sj ,  l>2  <  j  and  vj  >  a}, 
U2(/i,/2,a)  =  {(*,  j)|/i  >  j,  l2  <  j  and  vj  >  a}, 

G3(/i,  /2,  a)  =  {(i,j)\h  <  sj ,  l2  >  ej  and  vj  >  a},  and 
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C4(l4,l2,a)  =  {(i,j)\h  >  j,l2  >  e-  and  vj  >  a}. 

We  have  the  following  lemma: 

Lemma  3  |A|  =  \U\  —  \B4\  —  \B2\  —  \B3\  —  \B4\  +  |Ci|  +  \C2  \  +  | C3 1  +  \C^\. 

Proof: 

It  is  easy  to  see  that  A  =  U  —  A  =  B4  U  B2  U  B3  U  B4.  Thus,  |A|  =  ]C;=i,2,3,4  \Bi\  — 
Sije{i,2,3,4},i^j  I  Bi  n  Bj  \  +  J2i,3,ke{i,2,3,4:},i^3^k  I  Bi  fl  Bj  fl  Bk\  —  |  nt-=i,2,3,4  Bi  |.  It  is  clear  the 
third  and  the  fourth  terms  in  the  right  hand  side  of  this  equation  are  both  zero.  As  for 
the  second  term,  the  only  four  non-empty  intersections  are  Bi  fl  B3,  Bi  fl  B4,  B2  fl  B3,  and 
B2  fl  B4,  which  correspond  to  the  sets  Ci,  C 2,  C3}  C4  respectively.  □ 

The  problem  of  determining  the  size  of  each  of  the  sets  U,  Bi  or  Ci  can  be  viewed  as  a 
special  version  of  three-dimensional  dominance  counting  problem  defined  as  follows: 

Q'2:  Given  a  set  V  of  n  three  dimensional  points,  preprocess  V  so  that  given  a 
point  (x,y,z),  the  number  of  points  in  V  that  are  dominated  by  (x,y,z)  can 
be  reported  efficiently. 

Unlike  the  reporting  case,  algorithms  for  the  three-dimensional  dominance  counting  prob¬ 
lem  that  have  linear  space  and  polylogarithmic  query  time  are  not  known  to  the  authors’  best 
knowledge.  However  Chazelle  gives  a  linear  space  and  O(log  n)  time  algorithm  [3]  for  the 
two-dimensional  case.  Using  the  scheme  of  the  range  tree,  his  result  can  easily  be  extended 
to  the  three-dimensional  case  by  first  building  a  binary  search  tree  on  the  x-coordinates,  and 
then  associate  with  each  node  the  data  structure  for  answering  two-dimensional  dominance 
queries  involving  only  the  y-  and  z-coordinates.  The  resulting  data  structure  provides  an 
O(n\ogn)  space  and  O(log2  n)  time  solution. 

By  using  the  fusion  tree  techniques,  we  were  able  to  improve  the  query  time  to 
O((log  n/  log  log  n)2)  at  the  expense  of  increasing  the  storage  cost  by  a  factor  of 
O(loge  n/  log  log  n).  For  details,  see  [16].  Since  we  have  a  total  of  nm  tuples,  Theorem  3 
follows. 

Theorem  3  Given  n  objects,  each  characterized  by  the  values  of  its  attribute  at  m  time 
instances,  we  can  preprocess  the  input  so  that  any  one-sided  counting  query  can  be  answered 
in  O(log2(nm))  time  using  an  O(nmlog(nm))  space  data  structure,  or 

O((log(nm)/ log  log(nm))2)  time  using  an  0(nm  log1+e(nm)/ log  log(nra))  space  data  struc¬ 
ture. 


5  Fast  Algorithms  for  Handling  Two-Sided  Queries 

In  this  section,  we  address  the  general  type  of  queries  for  which  the  values  of  the  objects  to 
be  reported  are  bounded  between  two  values  a  and  b  during  the  time  period  [l4..l2].  More 
specifically, 

Q 3:  Given  a  tuple  (l4,  l2,  a,  b ),  with  l4  <  l2  and  a  <  b,  report  all  objects  Op  such 
that  a  <  vj  <  b  for  all  j  =  l4, .  .  . ,  l2. 
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The  following  is  a  direct  extension  of  Observation  1. 

Observation  2  An  object  Oi  should  be  reported  for  a  Q 3  query  if  and  only  if  min{vj\j  £ 
[h-'h]}  A  a  and  ma,x{vj\j  £  [li--l2]}  <  b. 

In  this  section,  we  first  show  that,  even  for  an  arbitrary  number  d  of  attributes,  the 
two-sided  queries  can  be  handled  fast  if  we  are  willing  to  use  O(nm2polylog(n))  space  for 
the  indexing  structure.  We  later  show  that  we  can  achieve  fast  query  time  using  o(nm2) 
space  in  the  case  when  m  is  extremely  large.  We  start  by  looking  at  the  case  when  d  =  1, 
which  admits  a  simple  solution. 

To  achieve  a  polylogarithmic  query  time,  we  compute  for  each  pair  of  (ti,t2)  £  [l..m]  X 
[l..m]  with  ti  <  t2  the  minimum  value  m \1,t2  and  maximum  value  Af/1’*2  for  each  object 
Oi  and  index  the  n  minimum-maximum  pairs  in  a  suitable  data  structure  Ttld2  designed  to 
efficiently  handle  two-dimensional  dominance  queries.  Pointers  to  these  0(m 2)  structures 
can  be  stored  in  a  array  to  allow  constant-time  access.  Given  any  query  (h,  /2,  a,  6),  we  use 
(h,l2)  to  locate  the  appropriate  data  structure  Tl1’12  in  constant  time  and  use  it  to  answer 
the  two-dimensional  dominance  query:  m \1,t2  >  a  and  Af/1’*2  <  b. 

A  possible  data  structure  for  Ttl,t2  is  the  priority  tree  [13]  or  the  improved  version  of  the 
priority  tree  that  appeared  in  [22],  The  former  allows  O(log  n  +  /)  query  time  and  the  latter 
allows  O(log  n/  log  log  n  +  /)  query  time,  both  using  linear  space. 

We  can  handle  counting  queries  in  a  similar  fashion  using  as  Ttl,t2  Chazelle’s  linear  space 
data  structure  to  achieve  O(log  n)  query  complexity  or  the  one  in  [16]  with  0(n  loge  n)  space 
and  O(log  n/  log  log  n)  query  time.  Since  we  have  m(m  —  l)/2  (G,  G)-pairs,  Theorem  4 
follows. 

Theorem  4  Given  n  objects,  each  of  which  is  specified  by  the  values  of  its  attribute  at  m 
time  instances,  it  is  possible  to  design  an  indexing  structure  so  that  the  reporting  version  of 
any  two-sided  query  can  be  answered  in  0 (log  n/  log  log  n  +  /)  time  using  0(nm2)  space  for 
the  indexing  structure.  The  counting  version  can  be  handled  in  0(nm2)  space  and  O(logra) 
query  time,  or  0(nm2  \og  n)  space  and  O(log  n/  log  log  n)  query  time. 

The  strategy  described  above  can  be  extended  to  handle  any  arbitrary  number  d  of 
attributes  describing  each  object.  Our  general  problem  will  be  reduced  to  0(m 2)  2 d- 
dimensional  dominance  queries.  Using  the  results  of  [16],  we  obtain  the  following  theorem. 

Theorem  5  The  general  temporal  range  query  problem,  with  n  objects,  each  with  d  >  1 
attributes  specified  at  m  time  instances,  can  be  handled  with  a  data  structure  of  size  0(m2  ■ 
n  loge  n(log  nj  log  log  n)2d~3)  and  a  query  time  O((log  nj  log  log  n)2d~2  +  /).  The  counting 
query  can  be  handled  in  O((log  nj  log  log  n)2d~v)  time  using  0(m2-n  loge  n(log  nj  log  log  n)2d~2) 
space. 

Clearly  the  space  used  to  handle  two-sided  queries,  even  in  the  case  when  d  =  1,  is 
quite  high.  An  interesting  problem  is  whether  there  exists  a  data  structure  whose  size  is 
o(nm2),  such  that  the  general  temporal  range  search  problem  can  be  solved  in  time  that 
is  polylogarithmic  in  nm  and  proportional  to  the  number  of  objects  found.  We  provide  a 
partial  answer  to  this  question  by  showing  that  this  is  indeed  the  case  when  m  is  extremely 
large. 
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Theorem  6  Given  n  objects,  each  characterized  by  the  values  of  its  attribute  at  m  time 
instances  such  that  m  >  n\,  it  is  possible  to  design  an  indexing  structure  such  that  the 
reporting  version  of  any  two-sided  query  can  be  answered  in  O(logc  n  +  f)  time  using  an 
o{nm 2)  space. 

Proof: 

For  each  pair  of  time  instances  j i  and  j2,  let  m31’32  =  min{n^|j  £  [ji-.jh]},  and  M31’32  = 
ma,x{vj\j  £  Let  {if'32 ,  if’32 , .  .  . ,  if'32)  be  the  permutation  of  (1,  2,  •  •  •  ,  n)  such  that 

mf;322  <  mf’322  <  •••  <  mff22.  Similarly,  let  {If ,32 ,  If ,32 , .  .  . ,  If J2 )  be  the  permutation 

^2  * TL 

of  (1,2, ...,ra)  such  that  M3)’f22  <  M3)’f22  <  ...  <  M3)’f22.  We  define  two  mappings  f 31,32 

I\  ^2 

and  F31’32,  such  that  i3p322  ^  =  k  and  I3ff1232(k)  =  ^  f°r  ^  =  1?2,  •  •  •  ,ra.  Thus  an  object  Oi 
corresponds  to  two  numbers  f 31,32  {i)  and  F3l,32{i )  that  basically  give  the  ranks  of  Oi  for  the 
time  period  [ji..j2]  with  regard  to  its  maximum  and  minimum  values  respectively.  In  other 
words,  point  {f3l,32{i),F 31,32  {if)  is  the  representation  of  object  Oi  in  the  two-dimensional 
rank  space  corresponding  to  the  time  period  [ji..j2]. 

Note  that  there  are  at  most  0{n\)  permutations  of  (1,2,  ...,ra).  Therefore  at  most 
0{fn\ )2)  different  point  sets  are  possible  for  each  pair  of  j i  and  j2.  During  preprocessing 
time,  we  simply  build  one  priority  tree  for  each  possible  point  set  and  construct  an  array  of 
to2  entries  that  indicate  for  each  pair  {ji,j2)  the  corresponding  priority  tree. 

Since  the  query  is  given  as  (h,  /2,  a,  6),  we  have  to  map  the  numbers  a  and  b  to  the  rank 
space  of  (/i,/2)  before  the  corresponding  priority  tree  can  be  searched.  Let  a3l,j2  and  b31’3 2 
be  the  parameters  used  to  search  the  appropriate  priority  tree.  Then  a31’32  is  equal  to  the 
number  of  points  that  are  always  greater  than  or  equally  a  during  the  time  period  [/i,/2] 
and  b31’3 2  is  equal  to  the  number  of  points  that  are  always  less  than  or  equal  to  b  in  that 
period.  These  two  numbers  can  be  independently  computed  using  the  results  in  Section  4. 
Even  without  using  the  fusion  tree,  this  step  still  can  be  done  in  O(log2(nm))  time  using 
0{nm  log(nm))  space. 

The  storage  cost  for  the  priority  trees  and  the  array  is  0{m 2  +  n(n\)2  +  nmlog(nm))  = 
o{nm 2).  Therefore  the  total  storage  cost  is  o{nm 2).  After  the  ranks  of  a  and  b  are  determined, 
the  query  can  be  answered  in  O(log  n  +  /)  time.  Thus  the  total  computational  time  is 
O(log2(nm)  +  /).□ 

6  Conclusion 

We  have  introduced  in  this  paper  a  general  class  of  problems  involving  temporal  range 
queries,  which  seems  to  be  widely  applicable.  We  have  shown  that  this  problem  can  be 
reduced  to  a  number  of  multidimensional  dominance  search  problems,  and  hence  can  in 
principle  be  solved  fast  using  nonlinear  space  data  structures.  Special  cases  for  one-sided 
queries  were  shown  to  admit  elegant  solutions  using  linear  size  data  structures  and  polylog- 
arithmic  query  time.  A  simple  intriguing  problem  is  whether  the  two-sided  version  for  d  =  1 
can  be  solved  in  polylogarithmic  time  using  linear  space.  Note  that  this  problem  can  easily 
be  reduced  to  solving  the  one-sided  version  for  d  =  2,  and  hence  it  is  somewhat  the  easiest 
problem  to  tackle  next. 
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